Analyzing and visualizing the semantic coverage of Wikipedia and its authors

نویسندگان

  • Todd Holloway
  • Miran Bozicevic
  • Katy Börner
چکیده

This paper presents a novel analysis and visualization of English Wikipedia data. Our specific interest is the analysis of basic statistics, the identification of the semantic structure and age of the categories in this free online encyclopedia, and the content coverage of its highly productive authors. The paper starts with an introduction of Wikipedia and a review of related work. We then introduce a suite of measures and approaches to analyze and map the semantic structure of Wikipedia. The results show that cooccurrences of categories within individual articles have a power-law distribution, and when mapped reveal the nicely clustered semantic structure of Wikipedia. The results also reveal the content coverage of the article’s authors, although the roles these authors play are as varied as the authors themselves. We conclude with a discussion of major results and planned future work. Summary of results for the nonspecialist: Wikipedia is a free ‘encyclopedia of everything’ that was started by Jimmy Wales on January 15, 2001. Less than five years after its creation it comprises over 2,700,000 articles written by about 90,000 different contributors in 195 languages. This paper provides basic statistics, analyzes and maps the semantic structure of the English Wikipedia as well as the activity of its major authors. Holloway, Todd, Božicevic, Miran and Börner, Katy. (2007) Analyzing and Visualizing the Semantic Coverage of Wikipedia and Its Authors. Complexity, Special issue on Understanding Complex Systems. Vol. 12(3), pp. 30-40. Also available as cs.IR/0512085.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Participation and Scientific Collaboration in Persian Wikipedia

Background and Aim: This research studies the effective participation and scientific collaboration in Persian Wikipedia, from 2003-2012.  Method: The library method has been used. Also, considering the objectives and the nature of subject, the research method is a descriptive-applied and during its implementation scientometric technique has been used. Excel and SPSS softwares have been used for...

متن کامل

Advertising Keyword Suggestion Using Relevance-Based Language Models from Wikipedia Rich Articles

When emerging technologies such as Search Engine Marketing (SEM) face tasks that require human level intelligence, it is inevitable to use the knowledge repositories to endow the machine with the breadth of knowledge available to humans. Keyword suggestion for search engine advertising is an important problem for sponsored search and SEM that requires a goldmine repository of knowledge. A recen...

متن کامل

On the Problem of Lexical Semantic Change

The article provides an insight into a problem of lexical semantic change. A short historical outline of the development of semantic studies is given. The authors analyze some of the most important stages in the history of the formation of this field. The existing approaches to dealing with form and meaning, namely semasiological and onomasiological ones are discussed. The authors consider the ...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

Extracting Semantics Relationships between Wikipedia Categories

The Wikipedia is the largest online collaborative knowledge sharing system, a free encyclopedia. Built upon traditional wiki architectures, its search capabilities are limited to title and full-text search. We suggest that semantic information can be extracted from Wikipedia by analyzing the links between categories. The results can be used for building a semantic schema for Wikipedia which cou...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Complexity

دوره 12  شماره 

صفحات  -

تاریخ انتشار 2007